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Eric Davidson at Caltech has spent 
several decades investigating the 
molecular basis of animal development 
using the sea urchin embryo as an experi- 
mental system 1,2 although his scholarship 
extends to all of embryology as embodied 
in several editions of his landmark book. 3 
In recent years his laboratory has become 
a leading force in constructing gene 
regulatory networks (GRNs) operating 
in sea urchin development. 4 This axis of 
his work has its roots in this laboratory's 
cDNA cloning of an actin mRNA from 
the sea urchin embryo (for the time- 
line, see ref. 1) — one of the first eukary- 
otic mRNAs to be cloned as it turned 
out. From that point of departure, the 
Davidson lab has drilled down into other 
genes and gene families and the factors 
that regulate their coordinated regula- 
tion, leading them into the GRN era 
(a field they helped to define) and the 
development of the computational tools 
needed to consolidate and advance the 
GRN field. 

The nodes in a GRN represent regula- 
tory genes (genes that encode transcrip- 
tion factors and signaling molecules) and 
the edges represent regulatory logic and 
gene interactions encoded in genomic cis- 
regulatory binding sites. GRNs are con- 
structed using four types of experimental 
data: temporal expression patterns of regu- 
latory genes, spatial expression patterns of 
regulatory genes, trans-perturbation data 
(the change in gene expression patterns 
when one or more regulatory genes are 
perturbed) and cis-perturbation data (the 
change in gene expression patterns when 
one or more cis-regulatory binding sites 



are mutated). In principle, GRNs provide 
a system-level, causal model for the spatial 
and temporal patterns of gene expression 
and mechanistic understanding of how a 
zygote develops into an embryo. 

The decades of effort by Davidson 
and colleagues has resulted in a GRN 
for endomesoderm speciation for the sea 
urchin embryo, one of the most complete 
GRNs currently available for animal 
development. Yet, this GRN is a static 
conglomeration of heterogeneous experi- 
mental data. It is unknown whether this 
GRN is complete; more importantly, it 
is unknown whether it can predict the 
progression of the large-scale develop- 
mental process. In a recent PNAS pub- 
lication 5 Peter, Faure and Davidson built 
a Boolean computation model based on 
this GRN in order to address these two 
questions. They analyzed the latest ver- 
sion of the endomesoderm GRN which 
contained 50 regulatory genes plus the 
regulatory interactions controlling the 
specification of endoderm and mesoderm 
from early cleavage stages (six hours post- 
fertilization) to the onset of gastrulation 
(30 h). They argued that spatial expres- 
sion was discrete in nature and could 
be captured by "on" or "off" states in a 
Boolean model. Another key assumption 
in their model was that the time interval 
between the activation of a regulatory 
gene and its immediate downstream reg- 
ulatory gene in the GRN is three hours 
for all the genes in the GRN, with this 
assumption resting on the notion that 
this is the amount of time required for 
transcription of the upstream gene, syn- 
thesis of the protein (together with a pro- 
tein half-life much longer) and binding 
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of the protein to the cis-regulatory region 
of the downstream gene. They converted 
the GRN to a set of 75 vector equations 
(e.g., if gene A in domain 1 is on and 
gene B in domain 2 is off, then gene C 
in domain 1 is on; otherwise gene C in 
domain 1 is off, etc.) and tested whether, 
starting with non-zygotic (i.e., maternal) 
inputs as the initial state the model, it 
could run autonomously and reproduce 
the temporal and spatial expression pat- 
terns during the developmental processes 
of four cellular domains of the sea urchin 
embryo. 

Implications and Speculations 

Encouragingly, the Boolean model was 
able to make highly accurate predictions 
on spatial and temporal gene expression 
patterns. Indeed, only several predictions 
among the 2,772 time-space-gene com- 
binations were at odds with experimen- 
tal data. This may not seem surprising 
because the GRN is based on interpreta- 
tions of huge masses of expression data, 
perturbation data and cis-regulatory data. 
Peter et al. proceeded to perform more 
stringent tests of the model by asking 
how it would respond to four perturba- 
tions: extinction of the expression of the 
delta gene, global expression of the pmarl 
gene, extinction of hoxllll3b expression, 
and most challengingly, the transplanta- 
tion of four cleavage skeletogenic micro- 
meres into the animal pole of an otherwise 
normal embryo that possessed its own set 
of four micromeres at the vegetal pole. 
(The intra-embryo transposition of blas- 
tomeres harkens back to the classical era 
of experimental embryology). The authors 
emphasized that except for the boxll/13b 
test, the perturbation results they sought 
to reproduce were not used in building the 
GRN. (Transplanted micromeres could 
of course not have been part of the GRN, 
which is for a normal embryo). The results 
of these perturbation tests were in nearly 
perfect agreement with the experimental 
data, which led the authors to two con- 
clusions: the GRN contained sufficient 



information to provide a system-level 
causal explanation for sea urchin devel- 
opment, and the Boolean computational 
model was a useful tool for in silico test- 
ing of GRN and making predictions upon 
perturbations. 

The test of blastomere transplantation 
demonstrated the critical role of intercel- 
lular signaling between the different spa- 
tial domains in development. The cells 
in these domains obviously need to work 
cooperatively to ensure precise and robust 
developmental progression and thus 
information on the regulatory state of 
each cell must be able to diffuse spatially. 
Peter et al. defined a Janus factor for every 
inductive signaling interaction in their 
Boolean model. It would be interesting 
to investigate the molecular basis of inter- 
cellular signaling here. We enjoy so many 
examples of such events in development, 
e.g., the Wnt pathway in Drosophila 
development to mention just one of many 
examples, but we have no case in which 
a paracrine signaling pathway amidst a 
developmentally determinative cluster 
of cells can be put into the context of a 
GRN as detailed as the one Peter et al. 
have defined. From the information the- 
ory perspective, intercellular signaling 
can be regarded as communicating mes- 
sages between two or more intelligent 
agents. Thus, the developmental process 
is also a diffusion process of the genomic 
regulatory information. Related domains 
of this emerging field include "molecular 
information theory" 6 and information 
networks in the data mining field 7 from 
both of which the modeling of intercel- 
lular signaling may benefit. 

Peter et al. went so far as to suggest 
that the gene regulatory models could 
sufficiently explain all the gene expres- 
sion patterns in sea urchin development, 
without considering non-coding RNAs. 
For example, a recent study identified long 
noncoding RNAs (IncRNA) in zebraf- 
ish embryogenesis 8 and revealed that 
IncRNAs were specifically enriched in 
early-stage embryos. IncRNAs of course 
must be integrated into all current GRN 



research. Ironically, Davidson and his 
longtime partner Roy Britten were the 
first to postulate such a role. 9,10 

Peter et al. concluded that Boolean 
modeling is a useful implementation of 
the original GRN concept. Is there any 
information loss from the continuous 
data to Boolean data, from gene regula- 
tory logic to Boolean logic? What would 
be a good cutoff for converting the expres- 
sion level to "on" or "off" ? Does the cutoff 
depend on the gene or the spatial domain? 
If a GRN is not available in a given case, 
can a Boolean model be directly inferred 
and tested using raw experimental data? 
If so, what kinds of experimental data are 
most suitable and in what way can the 
causal structure be best captured in the 
Boolean model? Theoretical frameworks 
of causal inference from observational 
data have been studied in machine learn- 
ing for decades. 11 Possibly the established 
causal inference algorithms can expand 
this newest chapter in GRN-ology from 
the pioneering Davidson lab into more 
general settings. Can this work pro- 
vide insights into other animal develop- 
ment, e.g., mouse, human or into human 
embryonic stem (ES) cell differentiation? 
What are the conserved parts and new 
features across species? Is it possible to 
apply the general computational frame- 
work to test GRN models derived from 
biological processes in addition to devel- 
opment, such as immunology and post- 
natal neurogenesis in the subventricular 
zone of the brain to mention only two of 
many frontiers before us that beckon for 
the GRN approach? With the advances 
of next generation sequencing technology 
and its cost now rocketing downward, a 
large amount of data will soon be in hand 
for various biological processes across 
diverse phyla. Much of this momentum 
now comes from the sea urchin embryo 
and from the scientific mind of Eric 
Davidson. 
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